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Abstract 

This  is  a  manual  for  the  language  independent  linearization  engine,  oxyGen.  This  sys¬ 
tem  has  been  developed  as  part  of  the  Machine  Translation  (MT)  effort  at  the  University 
of  Maryland  College  Park.  oxyGen  has  been  used  as  an  integral  part  of  the  National 
Language  Generation  (NLG)  component  of  an  interlingual  Chinese-English  MT  project 
and  a  Spanish- English  MT  project.  It  has  also  been  used  to  generate  simple  Spanish  and 
Chinese  sentences  on  a  large  scale  of  coverage.  This  manual  includes  an  introduction  to 
the  language  oxyL  and  a  reference  manual  complete  with  a  sample  working  grammar  for 
English. 


***The  support  of  the  LAMP  Technical  Report  Series  and  the  partial  support  of  this 
research  by  the  National  Science  Foundation  under  grant  EIA0130422  and  the  Depart¬ 
ment  of  Defense  under  contract  MDA9049-C6-1250  is  gratefully  acknowledged. 


Report  Documentation  Page 

Form  Approved 

0MB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  0MB  control  number. 

1.  REPORT  DATE 

OCT  2001 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-10-2001  to  00-10-2001 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

A  Reference  Manual  to  the  Linearization  Engine  oxyGen  Version  1.6 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Language  and  Media  Processing  Laboratory, Institute  for  Advanced 
Computer  Studies, University  of  Maryland, College  Park, MD, 20742-3275 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR’S  ACRONYM(S) 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18.  NUMBER 

OF  PAGES 

29 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98} 

Prescribed  by  ANSI  Std  Z39-18 


A  Reference  Manual 
to  the  Linearization  Engine 
oxyGen 
version  1.6 


Nizar  Habash 
University  of  Maryland 
Institnte  for  Advanced  Compnter  Stndies 


September  f3,  200f 


Contents 


1  oxyGen  2 

1.1  Introduction .  2 

1.2  Linearization .  3 

1.3  oxyGen;  A  Hybrid  System .  4 

2  oxyL  6 

2.1  Abstract  Meaning  Representation .  6 

2.1.1  OxyL  Basic  Tokens .  8 

2.2  oxyL  File .  9 

2.3  oxyL  Rules  .  10 

3  Sample  oxyL  Grammar  for  English  13 

3.1  The  oxyL  File .  13 

3.2  Input  and  Output  .  15 

4  oxyGen  Reference  17 

4.1  oxyGen  Package  .  17 

4.1.1  oxyGen  Installation  .  17 

4.1.2  oxyGompile .  18 

4.1.3  oxyRun .  18 

4.1.4  oxyLin .  19 

4.1.5  oxyDebug .  19 

4.2  Declarations .  20 

4.3  Built-in  Functions  .  22 

4.4  Built-in  Recasts .  23 

4.5  Reserved  Tokens .  25 

4.5.1  Reserved  Variables .  25 

4.5.2  Reserved  Roles .  25 

4.5.3  Reserved  Functions .  25 

4.5.4  Reserved  Strings .  26 


1 


Chapter  1 


oxyGen 


1.1  Introduction 

This  is  a  manual  for  the  language  independent  linearization  engine,  oxyGen. 
This  system  has  been  developed  as  part  of  the  Machine  Translation  (MT)  ef¬ 
fort  at  the  University  of  Maryland  College  Park  [1,  8].  oxyGen  has  been  used 
as  an  integral  part  of  the  Natural  Language  Generation  (NLG)  component  of 
an  interlingual  Chinese- English  MT  project  and  a  Spanish-English  MT  project. 
It  has  also  been  used  to  generate  simple  Spanish  and  Chinese  sentences  on  a 
large  scale  of  coverage  [3].  Natural  Language  Generation  is  interested  in  taking 
non-linguistic  representations  as  input  and  converting  them  into  natural  lan¬ 
guage  output.  NLG  can  be  divided  into  two  major  distinct  operations;  Lexical 
Selection  and  Linearization.  The  former  is  concerned  with  selecting  the  correct 
natural  language  lexical  item  such  as  eat  versus  devour  or  car  versus  vehicle. 
The  later  is  concerned  with  the  relative  positioning  of  lexical  items  on  the  sur¬ 
face  such  man  hit  dog  versus  dog  hit  man  or  man  dog  hit.  oxyGen  is  an  engine 
for  developing  programs  to  do  the  later  operation;  Linearization.  The  input 
to  such  programs  is  a  labeled  Eeature  Graph  (EG)  representation  of  a  natural 
language  sentence.  The  particular  form  of  EGs  used  here  is  a  modified  version  of 
Nitrogen’s  Abstract  Meaning  Representation  (AMR)  [5,  6].  AMRs  are  labeled 
directed  feature  graphs  written  using  the  syntax  of  the  Penman  Sentence  Plan 
Language  [4].  The  output  of  the  linearization  programs  developed  using  oxyGen 
is  a  word  lattice,  a  compressed  representation  of  the  various  possible  generated 
sequences.  See  Eigure  1.1. 


oxyGen 

'  Word] 

Linearizer 

^Lattice 

Eigure  1.1;  oxyGen  Linearizer 
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1.2  Linearization 


To  exemplify  the  use  of  oxyGen  and  linearization  in  general,  take  the  following 
input  AMR; 


,  (1  /  I  like  I 

:P0S  Verb 

:Subject  (2  /  Imanl  :P0S  Noun) 

:0bject  (3  /  I  carl  :P0S  Noun)) 

This  AMR  can  be  read  as  there  is  a  verb,  like,  and  it  has  a  subject,  man, 
which  is  a  noun  and  an  object,  car,  which  is  also  a  noun.  In  English,  a  proper 
word  order  would  be  man  like  car  (or  more  fluently  the  man  likes  the  car,  but 
let’s  not  worry  about  fluency  for  now).  To  specify  that  an  SVO  (subject  verb 
object)  order  is  desired  in  English  (versus  VSO  or  SOV),  we  need  a  linearization 
rule  such  as  the  following; 


(??  (fteq  @pos  Verb)  ->  (@subject  @/  Subject) 
->  (@/)) 


This  rule  is  written  using  oxyL  (oxyGen  Language),  a  flexible  and  powerful 
language  that  has  the  power  of  a  programming  language  but  focuses  on  natural 
language  realization.  This  rule  can  be  read  as  if  the  part  of  speech  (POS)  of 
the  current  AMR  is  Verb,  then  linearize  the  subject  AMR  followed  by  the  word 
instance  followed  by  the  object  AMR;  otherwise  linearize  the  word  instance  by 
itself  This  is  a  very  simple  grammar  that  needs  more  extensions  to  handle  real 
input  with  different  phrase  structures  and  parts  of  speech.  But  a  real  AMR 
is  also  complex  on  a  different  dimension;  Ambiguity.  Let’s  assume  the  input 
AMR  is  a  result  of  a  lexical  selection  process  for  the  same  sentence  in  (1)  from 
a  language  that  doesn’t  specify  number  (singular  versus  plural)  and  its  word  for 
like  is  ambiguous  in  that  it  covers  the  concepts  of  desire  and  love.  This  AMR 
could  look  as  follows; 


(3) 


(0  :0R  (1  /  (*or*  I  like  I  I  likes  I) 
:P0S  Verb 


: Subject  (2  / 

(♦or* 

1  man  I 

1 meni ) 

:P0S 

Noun) 

: Object  (3  / 

(♦or*  1 

car  1 

1  cars  1 ) 

:P0S 

Noun) ) 

OR  (4 

/  (*or*  1  desire  1  I  desires 

1) 

:P0S  Verb 

: Subject  (5  / 

(♦or* 

1  man  I 

1 menI ) 

:P0S 

Noun) 

: Object  (6  / 

(♦or*  1 

car  1 

1  cars  1 ) 

:P0S 

Noun) ) 

OR  (7 

/  (*or*  1  love  1  1  loves  1) 

:P0S  Verb 

: Subject  (8  / 

(♦or* 

1  man  I 

1 meni ) 

:P0S 

Noun) 

: Object  (9  / 

(♦or*  1 

car  1 

1  cars  1 ) 

:P0S 

Noun) ) ) 
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Since  such  ambiguity  can  occur  anywhere  in  an  AMR,  it  presents  a  challenge 
to  writing  simple  linearization  rules  whose  application  is  conditional  upon  spe¬ 
cific  AMR  role  combinations  at  different  depths.  However,  the  beauty  of  oxyGen 
is  that  it  allows  hiding  the  ambiguity  of  the  input  from  the  grammar  description 
so  that  both  AMRs  (1  and  3)  can  be  linearized  using  the  same  grammar  rule  in 
(2).  Of  course,  the  ambiguity  of  (3)  will  lead  to  a  large  set  of  sequences; 


(4) 


man  like  car 
man  like  cars 
man  likes  car 
man  likes  cars 
men  like  car 
men  like  cars 
men  likes  car 
men  likes  cars 


man 

desire 

car 

man 

love 

car 

man 

desire 

cars 

man 

love 

cars 

man 

desires 

car 

man 

loves 

car 

man 

desires 

cars 

man 

loves 

cars 

men 

desire 

car 

men 

love 

car 

men 

desire 

cars 

men 

love 

cars 

men 

desires 

car 

men 

loves 

car 

men 

desires 

cars 

men 

loves 

cars 

A  statistical  extraction  module  can  be  used  to  rank  the  different  sequences 
using  uni  and  bigram  statistics  or  other  language  models.  The  statistical  ex¬ 
traction  component  of  Nitrogen  [5,  6]  is  one  such  module. 

In  addition  to  hiding  ambiguity  from  the  grammars,  oxyGen  provides,  through 
oxyL,  a  great  power  to  the  grammar  writers  by  providing  complex  tools  designed 
with  natural  language  linearization  in  mind.  oxyGen  can  also  be  extended  and 
modified  easily  via  second  and  third-party  code. 


1.3  oxyGen:  A  Hybrid  System 

oxyGen  compiles  target  language  grammars  written  in  oxyL  into  compilable 
Lisp  programs  that  take  AMRs  as  inputs  and  generate  word  lattices  that  can 
be  passed  along  to  be  ranked  by  some  language  model.  This  approach  to  lin¬ 
earization  implementation  is  a  hybrid  between  the  declarative  and  procedural 
paradigms.  oxyGen  uses  a  linearization  grammar  description  language  (oxyL) 
to  write  declarative  grammar  rules  which  are  then  compiled  into  a  program¬ 
ming  language  (Lisp)  for  efficient  performance.  This  hybrid  approach  allows 
oxyGen  to  maximize  the  advantages  and  minimize  the  disadvantages  of  a  pure 
procedural  implementation  (in  Lisp  or  C)  or  a  pure  declarative  implementation 
(in  Nitrogen  grammar).  oxyGen  contains  three  main  elements;  a  linearization 
grammar  description  language  (oxyL),  an  oxyL  to  Lisp  compiler  (oxyCompile) 
and  a  run-time  support  library  (oxyRun).  Target  language  linearization  gram¬ 
mars  written  in  oxyL  are  compiled  off-line  into  oxyGen  linearizers  using  oxy¬ 
Compile  (Figure  1.2). 

oxyGen  linearizers  are  Lisp  programs  that  require  the  oxyRun  library  of 
basic  functions  in  order  to  execute  (Figure  1.3).  They  take  AMRs  as  input  and 
create  word  lattices  as  output. 

In  addition  to  the  oxyCompile  and  oxyRun  components,  there  are  currently 
two  additional  components  oxyLin,  a  simple  converter  from  word  lattices  to 
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oxyL 


Lisp 


Lisp 


Figure  1.2;  oxyGen  Compilation  Step 


oxyRun 

- ^ 

[AMm^ 

oxyGen 

Word 

Linearizer 

Lattice 

Figure  1.3;  oxyGen  Runtime  Step 


surface  sequences,  and  oxyDebug,  a  support  code  for  debugging  the  compiled 
linearization  grammars.  The  specifications  of  all  these  components  are  in  Chap¬ 
ter  4. 

A  more  detailed  discussion  of  the  motivation  and  advantages  of  oxyGen  is 
presented  in  [2].  There  is  also  an  evaluation  of  oxyGen  based  on  speed  of  per¬ 
formance,  size  of  grammar,  expressiveness  of  the  grammar  description  language, 
reusability  and  readability/writability.  The  evaluation  context  is  provided  by 
comparing  an  Oxygen  linearization  grammar  for  English  to  two  other  imple¬ 
mentations,  one  procedural  (using  Lisp)  and  one  declarative  (using  Nitrogen 
linearization  module).  The  three  comparable  linearization  grammars  were  used 
to  calculate  speed  and  size.  Overall,  Oxygen  had  the  highest  number  of  advan¬ 
tages  and  its  only  disadvantage,  speed,  ranked  second  to  the  Lisp  implementa¬ 
tion  (see  Table  1.1).  The  version  of  oxyGen  described  in  this  manual  is  a  more 
efficient  implementation  of  Oxygen  than  the  one  evaluated  in  [2].  A  second 
evaluation  for  a  larger  English  grammar  in  oxyGen  and  Lisp  showed  Lisp  is  still 
faster  than  oxyGen.  However  the  gap  in  speed  between  the  Lisp  and  Oxygen 
implementations  shrunk  from  Oxygen  being  24  times  slower  than  Lisp  in  [2]  to 
only  1.5  times. 


Procedural 

(Lisp) 

Hybrid 

(Oxygen) 

Declarative 

(Nitrogen) 

Speed 

+ 

0 

- 

Size 

0 

+ 

- 

Expressiveness 

+ 

+ 

- 

Reusability 

- 

+ 

+ 

Readability/ 

Writability 

“ 

+ 

“ 

Table  1.1;  Oxygen  Evaluation 
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Chapter  2 

oxyL 


oxyL  (oxyGen  Language)  is  the  language  used  by  oxyGen  to  write  linearization 
grammars.  It  is  a  flexible  and  powerful  language  that  has  the  power  of  a  pro¬ 
gramming  language  but  focuses  on  natural  language  realization.  As  a  prelude 
to  describing  the  syntax  of  oxyL,  we  will  describe  the  form  of  the  structures 
oxyL  commands  are  applied  to,  Abstract  Meaning  Representations.  Then,  we 
will  discuss  oxyL’s  basic  tokens  followed  by  the  syntax  of  an  oxyL  file  and  oxyL 
rules  and  functions. 


2.1  Abstract  Meaning  Representation 

Abstract  Meaning  Representations  (AMR)  are  labeled  directed  feature  graphs 
written  using  the  syntax  of  the  Penman  Sentence  Plan  Language  [4]; 

/r\  <AMR>  : :=  <terminal>  I  I  (<label>  {<role>  <value>}+) 

^  <value>  ::=  <AMR>  I  I  <terminal> 

<terminal>  : :=  <word>  I  I  <wordlist> 

Every  node  in  an  AMR  has  a  label  and  one  or  more  role-value  pairs.  Roles, 
i.e.  features,  are  marked  by  a  colon  prefix  except  for  the  default  role,  :  inst 
(instance),  which  can  be  represented  as  a  forward  slash  /.  Values  may  be 
meaning  bearing  terminal  tokens  or  AMR  nodes.  These  terminal  tokens  can  be 
semantic  concepts  such  as  I  china  I  or  I  love  I ,  syntactic  categories  such  as  1  or 
V,  plain  surface  text  strings  such  as  "China",  or  a  list  of  any  of  them  headed 
by  the  special  token  *or*  such  as  (*or*  man  men).  Except  for  a  small  number 
of  reserved  tokens  used  by  oxyGen,  most  of  the  AMR  tokens  are  user  and 
application-defined.  The  only  requirement  is  consistency  between  the  AMRs 
and  the  oxyL  grammars  to  linearize  them.  The  roles  and  concepts  of  an  AMR 
can  be  a  mix  of  syntactic  and  semantic  significance;  thematic  roles  such  as 
:  Agent  and  : Theme  and  syntactic  categories  such  as  :  Subject  and  ADV.  The 
following  is  an  example  of  a  basic  AMR  for  the  sentence  The  United  States 
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unilaterally  reduced  ike  China  textile  export  quota  : 


(1  /  I  reduce  I 
:CAT  V 

:Subject  (2  /  lunited  statesi  :CAT  W) 

:  Object  (3  /  I  quota  I 
:CAT  W 

:M0D  (4  /  I  china  I  :CAT  W) 

:M0D  (5  /  Itextilel  :CAT  Adj) 

:M0D  (6  /  I  export  I  :CAT  Adj)) 
iManner  (8  /  lunilaterallyl  :CAT  ADV)) 

In  this  example,  (a2  /  lunited  statesi  :  CAT  1)  is  the  subject  of  the  con¬ 
cept  I  reduce  I .  And  similarly,  N  is  the  category  of  the  concept  I  united  states  I 
The  basic  role  :  inst  or  /  is  always  present  in  a  basic  AMR. 

However  there  are  two  other  types  of  AMRs,  that  are  instance-less:  OR- 
AMR  and  AND-AMR.  The  first  is  a  disjunction  of  basic  AMRs,  whereas  the 
second  is  a  conjunction  of  basic  AMRs.  Both  are  constructed  using  multiple 
copies  of  the  same  special  role  (:0R  or  :A1D).  An  OR- AMR  express  lexical 
ambiguity,  i.e.,  which  structure  to  chose  among  many.  For  example,  a  variant  of 
the  above  AMR  in  which  the  root  concept  is  three  way  ambiguous  would  look 
as  follows  at  the  top  node; 


(7) 


(#  :0R  (#  /  I  reduce  I  .  .  .  ) 

:0R  (#  /  I  cut  I  .  .  .  ) 

:0R  (#  /  I  decrease  I  .  .  .  )) 


An  AND-AMR,  on  the  other  hand,  expresses  linearization  ambiguity,  i.e., 
how  to  order  the  AMRs  on  the  surface.  The  AMR  in  (6)  expresses  that  ambi¬ 
guity  in  the  AMR  for  quota,  which  contains  three  identical  roles  (:M0D).  That 
same  AMR  can  be  written  using  :AlDs  as  follows; 


(8) 


(1  /  I  reduce  I 
:CAT  V 

:Subject  (2  /  lunited  statesi  :CAT  W) 
:  Object  (3  /  I  quota  I 
:CAT  W 


:M0D  (0  :AWD  (4  /  Ichinal  : CAT  W) 

:AND  (5  /  Itextilel  :CAT  Adj) 
:AWD  (6  /  I  export  I  :CAT  Adj)) 
iManner  (8  /  lunilaterallyl  :CAT  ADV)) 


Handling  ;  AlDs  and  ;  ORs  is  done  automatically  and  is  hidden  from  the  user- 
defined  grammar.  The  ambiguity  of  an  OR-AMR  is  passed  on  to  the  word 
lattice,  while  AMRs  under  ;AlDs  are  permuted  to  produce  all  possible  lineariza¬ 
tions. 
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There  is  one  more  special  role,  :X-role.  It  is  used  to  express  role  ambiguity, 
i.e.,  a  role  can  be  of  two  or  more  role  names.  For  example,  The  two  AMR  in 
(9)  express  the  ambiguous  sentence  John  gave  Paul  a  gift  and  John  gave  a  gift 
to  Paul. 


,  ,  (#  /  I  give  I 

^  :subj  I  John  I 

:obj  I  gift  I 
:X-role  (#  /  X 

: iObj  Ipaull 
:PP  (#  /  Itol 

: ob j  I paul I ) ) ) 


(0  :0R  (#  /  I  give  I 

: sub j  I j  ohn | 

:obj  I  gift  I 
: iObj  I paul I ) 

:0R  (#  /  I  give  I 

: sub j  I j  ohn | 

:obj  I  gift  I 
:PP  (#  /  Itol 

: ob j  I paul | ) ) ) 

These  AMRs  are  different  in  that  the  first  AMR  expresses  the  ambiguity  lo¬ 
cally  as  an  ambiguous  role  (indirect  object  versus  prepositional  phrase),  whereas 
the  second  AMR  expresses  the  ambiguity  at  the  top  level  as  two  different  AMRs 
altogether.  Handling  :X-roles  is  done  automatically  and  is  hidden  from  the 
users.  They  are  expanded  to  full  fledged  OR-AMRs. 

2.1.1  OxyL  Basic  Tokens 

The  function  of  different  tokens  in  oxyL  is  marked  through  their  form  using  a 
prefix  symbol;  variables  are  prefixed  with  a  dollar  sign  (  e.g.  $form,  $tense), 
role-names  are  prefixed  with  a  colon  (  e.g.  :  agent,  :cat)  and  functions  are 
prefixed  with  an  ampersand  (  e.g.  &eq,  ftProperlameHash). 

In  addition  to  general  functions  (built-in  or  user-defined),  oxyL  has  a  spe¬ 
cial  class  of  functions  called  referential  functions.  These  functions,  which  are 
prefixed  with  an  ®  sign  (e.g.  Sgoal,  Sthis),  are  used  to  access  values  corre¬ 
sponding  to  specific  roles  of  the  current  AMR.  For  example,  Sgoal  returns  the 
value  corresponding  to  the  role  :goal.  If  the  current  AMR  is  (6)  in  section 
2,  ®subject  returns  (a2  /  lunited  states  I  :  cat  n).  The  value  of  the  in¬ 
stance  role,  /,  is  returned  using  the  special  referential  functions  ®/  or  ®inst.  A 
referential  function  can  specify  the  path  from  the  current  AMR’s  root  to  any 
value  under  it  by  concatenating  the  references  along  such  path.  For  instance,  if 


the  current  AMR  is  (6),  Osubject.cat  returns  1.  If  the  current  AMR  contains 
multiple  instances  of  the  same  role  as  in  :M0D  in  6,  the  values  are  returned  in  an 
AND-AMR.  For  example,  if  the  current  AMR  is  (6),  ®obj  ect  .mod .  inst  returns 
(#  :AID  Ichinal  :AID  Itextilel  :AID  lexporti).  Access  to  the  full  cur¬ 
rent  AMR  is  provided  through  the  self-referential  function  ®this.  For  example, 
®this  .  subj  ect  is  equal  to  ®subject. 

The  last  oxyL  basic  token  type  is  Macros,  which  are  prefixed  with  a  circum¬ 
flex  (e.g.  'IP-IOM).  Macros  are  treated  like  variables  except  that  while  variables 
appear  as  is  in  the  compiled  grammar,  macros  are  substituted  in  the  compiler. 
The  use  of  macros  makes  the  grammar  description  more  concise.  For  example,  if 
a  set  of  role- value  pairs  is  very  commonly  used  such  as  (:  Form  IP  :Case  lOM), 
they  can  be  referred  to  using  a  single  macro,  'IP-IOM. 

2.2  oxyL  File 

An  oxyL  file  contains  a  set  of  declarations  (see  Table  2.2).  Some  provide  meta¬ 
level  information  such  as  :  Langauge  and  :  Comment ,  while  others  allow  importing 
Lisp  code  such  as  :  Include  and  :Code.  The  declarations  :  Class,  :Gloabl  and 
:  Macro  define  variables  for  use  by  oxyCompile  or  oxyRun.  The  declaration 
:  Morph  allows  the  user  to  link  the  internal  morphology  handler  to  a  specific 
user-define  morphology  function.  And  the  declaration :  Debug  allows  the  user  to 
turn  on  and  off  the  debugging  utility  provided  by  oxy Debug.  The  declaration 
:  Recast  allows  the  user  to  define  functions  for  modifying  AMRs  using  a  special 
class  of  oxyL  functions  called  recasts,  the  declaration  :  Rule  allows  the  user 
to  define  specific  modules  to  handle  different  phenomena  such  as  the  different 
types  of  phrases.  The  most  important  and  the  only  obligatory  declaration  is 
MaiuRule  which  defines  the  core  of  the  grammar^.  The  next  section  will  describe 
the  structure  of  an  oxyL  rule.  The  details  of  the  use  of  all  other  declarations  is 
left  to  Chapter  4. 

^In  [2],  a  single  declaration  was  available  for  the  whole  grammar  :RULES.  This  has  been 
since  replaced  with  the  declarations  :Recast,  :Rule  and  :MainRule  which  provide  a  higher  level 
of  modularity  and  efficiency. 


9 


Declaration 

Function 

Example 

: Comment 

Adds  a  Comment 

: Comment  "Hello  World!" 

: Language 

Name  of  generated  grammar 

: Language  "English" 

: Include 

Lisp  file  to  load  at  runtime 

: Include  "EnglMorph . lisp" 

:  Code 

User-defined  Lisp  functions 

:Code  (  <lisp-code>  ) 

: Class 

Defines  a  class  of  roles 

: Class  : THETA  (:AG  :TH  :G0AL) 

: Global 

Declares  a  global  variable 

: Global  $M0DE  HTML 

: Macro 

Declares  a  macro 

:Macro  3pS  (:per  3  :num  sing) 

: Debug 

Controls  debugging  mode 

: Debug  nil 

: Morph 

Defines  the  morphological 
generation  function 

:  Morph  (Stmorph  @word 
Smorphemes) 

: Recast 

Defines  a  recast 

:Recast  ftPL  (@this  ++  (:num 
PL)) 

:  Rule 

Defines  a  rule 

:Rule  7,S  (->  (@S  @V  @0)) 

: MainRule 

Defines  the  Main  Function 

: MainRule  ((->  (do  7.XP))) 

Table  2.1;  oxyL  Declarations 


2.3  oxyL  Rules 


(10) 


<RULE> 


<ASSIG1)I> 

<C0WD> 

<RESULT> 


:  :=  (  [==  <ASSIGH>] 

??  <C0HD> 

->  <RESULT>* 

[->  <RESULT>]  ) 

:  :=  ((<variable>  <value>)+) 

:  :=  <Boolean  Expression> 

: :=  <RULE>  I  I  <SEQUENCE>  I  I 
(DO  <RULE-MME>  [<AMR>]  ) 


<SEqUENCE>: :=  ({<AMR>|  I <RECAST>}+) I  I 

(OR  <SEQUENCE>  <SEqUENCE>+)  I  I 

(LISP  <lisp-code>)  I  I  (CODE  <lisp-code>) 


<RECAST>  ::=  (<AMR>  <RECAST-0P>  <RECAST-OP-ARGS>+) 


The  above  BNF  describes  the  syntax  of  an  oxyL  rule.  A  rule  has  an  optional 
assignment  section,  introduced  with  ==,  in  which  local  variables  are  defined. 
The  second  part  of  a  rule  is  an  optional  condition  and  result  pair  that  can  be 
repeated  multiple  times.  Conditions  are  introduced  with  ??  and  results  are  in¬ 
troduced  with  ->.  And  finally  an  optional  result  is  allowed  as  the  default  when 
all  conditions  fail.  A  result  can  be  a  rule  in  itself  with  all  of  the  portions  de¬ 
scribed  above  or  it  can  be  a  sequence  of  AMRs  or  AMR-returning  tokens  such 
as  variables  or  functions.  R  also  can  be  a  call  to  a  user-defined  rule  using  the 
special  operator  DO,  which  takes  as  an  argument  an  optional  AMR  that  defaults 
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to  Sthis.  The  ability  to  embed  rules  within  rules  and  declare  local  variable 
with  deep  scope  allows  users  to  limit  the  size  of  the  grammar  and  increase  the 
speed  of  its  application  logarithmically.  The  linear  order  of  AMRs  in  the  result 
specifies  the  linear  order  of  the  surface  forms  corresponding  to  these  AMRs.  The 
grammar  is  run  recursively  over  each  one  of  the  different  AMRs.  This  process 
continues  until  terminal  values,  i.e.  surface  forms,  are  reached.  Consider  the 
following  oversimplified  rule; 


(11) 


(==  (($form  @form)) 

??  (fteq  $torm  S) 

->  (??  (&eq  @voice  Passive) 

->  (Sobject  (ftpassivize  @inst) 
->  (Ssubject  @inst  Sobject))) 


'by"  Ssubject) 


Initially,  this  rule  takes  the  value  of  the  role  :form  in  the  current  AMR  and 
assigns  it  to  the  variable  $form.  In  the  case  the  value  of  $form  equals  S,  a 
second  check  on  the  voice  of  the  current  AMR  is  done.  If  the  voice  is  passive, 
the  passive  word  order  is  realized.  Otherwise,  the  active  voice  word  order  is 
realized.  The  grammar  is  then  called  recursively  over  the  AMRs  of  Ssubject, 
Sobject  and  ®inst.  The  function  fepassivize  takes  the  AMR  of  ®inst  as 
input  and  can  return  either  a  passive  verb  AMR  that  gets  processed  by  the 
grammar  or  a  terminal  word  sequence.  In  addition  to  AMRs,  a  linearization 
sequence  can  contain  AMR  recast  operations.  A  recast  operation  is  made  out 
of  an  AMR  followed  by  one  or  more  pairs  of  recast  operator  and  recast  operator 
arguments.  Recast  operations  modify  AMRs  before  they  are  recursively  run 
through  the  grammar.  The  recast  mechanism  is  very  useful  in  restructuring  the 
current  AMR  or  any  of  its  components.  For  example,  the  ++  recast  operator 
adds  role-value  pairs  to  an  AMR.  This  is  useful  in  cases  such  as  adding  case 
marking  roles  on  the  subject  and  object  AMRs.  The  rule  described  above,  (11) 
could  be  modified  to  specify  case  as  follows; 


(12) 


(==  (($lorm  @lorm)) 

??  (fteq  $torm  S) 

->  (??  (fteq  @voice  Passive) 

->  ((Sobject  ++  (:case  nom) )  (ftpassivize  @inst) 
"by"  (Ssubject  ++  (:case  gen))) 

->  ((Ssubject  ++  (:case  nom))  @inst 
iect  ++  (:case  acc))))) 


Table  2.3  provides  a  list  of  some  oxyL  recast  operators  with  their  usage 
formalism  and  functionality.  Note  that  the  use  of  /  in  recast  operations  is 
different  from  its  role  as  a  shorthand  for  ;inst. 

Multiple  recast  operators  can  be  listed  one  after  another  in  the  same  recast. 
A  recast  can  also  be  embedded  in  another  recast.  For  example,  the  recast 
(®th.is  &&  (;a  (®a  ++  (;b  ®b)))  —  (;b))  moves  the  role  ;b  and  its  value 
under  ;b’s  sister  ;a  using  three  different  recast  operations.  Recasts  can  also 
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Name 

Op 

Usage 

Add 

+  + 

(<AMR>  ++  (<role>  <value>+)) 

Add  all  <role>t-<value>t  pairs  to  AMR 

Delete 

— 

(<AMR>  —  (<role>+)) 

Remove  all  <role>t-<value>t  pairs 

Replace 

&& 

(AMR  &&  (<role>  <value>+)) 

Replace  all  values  of  <role>t 

Simple 

Recast 

« 

(AMR  <<  (<new-role>  /  (<role>+))) 

Rename  all  existing  <role>t  as  <new-role> 

Hierarchy 

Recast 

<! 

(<AMR>  <!  ( (<new-role>+)  /  (<role>+))) 

Hierarchically  rename  available  <role>t  as  <new-role>t 

Table  2.2;  oxyL  Recast  Operators 


be  accessed  outside  of  results  using  the  general  recast  function  (&  <recast>). 
This  allows  recasting  an  AMR  any  where  before  passing  it  to  another  function  or 
Rule.  For  example,  (do  ‘/.V  (&  Sthis  ++  (:punct  ".")))  adds  a  punctuation 
mark  before  passing  the  current  AMR  to  the  rule  '/oV. 

A  result  can  also  introduce  alternative  sequences  using  the  special  operator 
OR  or  make  direct  calls  to  Lisp  functions  using  the  special  operator  LISP  (or 
CODE).  The  following  example  contains  both  OR  and  LISP  operators; 


(13) 


(==  (($name  @name)) 

->  (OR  (LISP  (FORMAT  nil  ""a  loves  me"  $name)) 

(LISP  (FORMAT  nil  ""a  hates  me"  $name)))) 


Note  that  calls  to  Lisp  functions  should  return  AMRs  (including  strings)  for 
proper  operation. 

The  special  main  rule  declared  with  ;MainRule  consists  of  a  list  of  regular 
rules.  For  example,  the  following  main  rule  does  one  of  two  things  every  time 
it  is  accessed;  terminate  generation  by  realizing  nothing  if  the  instance  of  the 
current  AMR  is  nil  or  *empty*,  or  pass  the  current  AMR  to  the  X-bar  rule 

y.xp. 

:MainRule  ( 

'  ■  ; ;  nothing  to  generate 

(??  (ftin  @inst  (|nil|  |*empty*|)) 

->  0  ) 

;;Basic  rule,  go  to  XP 
(->  (do  7,XP))) 
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Chapter  3 


Sample  oxyL  Grammar  for 
English 

This  chapter  presents  a  simple  oxyL  grammar  that  is  used  to  linearize  English 
syntactic  dependency  trees.  The  tokens  used  here  are  derived  from  the  cate¬ 
gories  and  relation  in  Dekang  Lin’s  Minipar  parser  [7].  Sample  input  AMRs 
and  outputs  using  oxyLin  and  Nitrogen’s  statistical  extraction  module  are  also 
presented. 


3.1  The  oxyL  File 

( 

: Language  "Simple  Inflected  English  Dependency" 

: Comment  "This  is  an  oxyGen  grammar  for  English  Generation" 
: Comment  "version  1.0  /  September  2001" 

: Include  "nitrolin . lisp" 


: Debug  nil 

: Global  $V  (V  VBE  V_I  V_W  V_P  V_N_A  V_N_C  V_W_I  V_W_W  V_C_W  V_N_N_A 
V_W_W_C  V_W_W_P  V.W.P.C  V_W_W_P_A  V_W_W_P_C  V_W_W_P_W 
V.W.W.W  XSAID  SAID  SAIDX) 

:Global  $W  (W  M  HUM  W_A  W_C  W_P) 

:Macro  “no-punct  (:punct  (a  /  |nil|)) 

: Class  :sub  (:S  :SUBJ) 


: Class  :as  (:AS-ARG  :AS-HEAD  :AS1) 

: Class  :REST  (:ABBREV  :AGE  :C  : CN  :DEST  : PC  :HEAD  :I  : INSIDE  : LEX-DEP 
: LOCATION  :P0SS  :SC  :  SPELLOUT  : TITLE) 
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iRecast  ftwhX  (@this  <<  (:wh  /  (:wha  :whn  :whp))) 

: Recast  ftinvX 

(@this  «  (:inv  /  (:IWV-AUX  : INV-BE  :INV-HAVE))) 

:  Recast  StAuxH 

(@this  <!  ((:auxl  : aux2  : aux3  :aux4)  /  ( : aux  :have  :be  : being))) 
:Rule  y,DET  (->  (@pre  @rest  @inst  @post)) 

:Rule  7.N 

(->  ( Scon j -word  Sdet  Snum  Smod  Slex-mod  Sgen 
Sinst 

Spnmod  Sperson  Sappo-mod  Sappo  Smod-post  Scompl  @comp2 
@P  Ssubcat  (Svrel  &&  “no-punct)  (Srel  &&  “no-punct) 

(Sconj  ++  (:conj-word  "and")))) 

:Rule  7. A 

(->  ( Scon j -word  Srest  Snum  Smod  Slex-mod  Samod  SWW 
Sinst 

Smod-post  SP  Ssubcat  (Sconj  ++  (: con j -word  "and")))) 

:Rule  7.P 

(->  (Sinst  Srest  SP-SPEC  SPcomp-N  SPCOMP-C  Ssubcat  Spunct)) 

:Rule  7.V 

(==  (($to  (0  /  I  to  I  :cond  (fteq  Stense  inf)))) 

??  (Stex  :inv) 

->  (Sconj-word  Swh  Sinv  Sneg  Ssub  Sauxl  Saux2  Saux3  $to 
Sinst 

Slex-mod  Sobj  Sobj2  Sdesc  Spred  SAS  SAS2  SP  SBY-SUBJ 
Sguest  Srest  (SMOD  &&  “no-punct)  Ssubcat  Spunct 
(Sconj  ++  (:conj-word  "and"))) 

->  (Sconj-word  Swh  Ssub  Sauxl  Sneg  Saux2  Saux3  Saux4  $to 
Sinst 

Slex-mod  Sobj  Sobj 2  Sdesc  Spred  SAS  SAS2  SP  SBY-SUBJ 
Sguest  Srest  (SMOD  &&  “no-punct)  Ssubcat  Spunct 
(Sconj  ++  (:conj-word  "and")))) 

:Rule  7V-punct 
(??  (&ex  :punct) 

->  (do  7.V) 

??  (ftex  :HH) 

->  (do  %V  (8:  Sthis  ++  (:punct  "?"))) 

??  (fteq  Saspect  IMPERATIVE) 

->  (do  XV  (&  Sthis  ++  (:punct  "!"))) 

->  (do  XV  (&  Sthis  ++  (:punct  ".")))) 
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:Rule  7,XP 
(==  (($pos  @pos)) 

??  (ftin  $pos  $V)  ->  (do  %V-punct  (ftauxH  (ftinvX  (ftwhx  @this)))) 
??  (Stin  $pos  $H)  ->  (do  ‘/.N) 

??  (fteq  $pos  DET)  ->  (do  7.DET) 

??  (fteq  $pos  prep)  ->  (do  %P) 

??  (fteq  $pos  A)  ->  (do  7.A) 

->  (@inst)) 

iMainRule  ( 

; ;  nothing  to  generate 

(??  (ftin  @inst  (|nil|  |*trace*|)) 

->  0  ) 


; ;  Conditional  generation  technique 
(??  (Stand  (Stex  :cond)  (Stnull  @cond)) 
->  0  ) 


; ;Basic  rule,  go  to  XP 
(->  (do  y,XP))) 

) 


3.2  Input  and  Output 

The  foilwing  are  four  AMRs  that  were  input  to  the  linearization  grammar  de¬ 
scribed  above.  Each  AMR  is  followed  by  oxyLin’s  output,  the  sentences  in 
parentheses  are  Nitrogen’s  top  choice. 

(5  /  I  organized! 

:P0S  V 

:S  (3  /  I  contest  I 
:P0S  M 

:DET  (1  /  ithej  :P0S  DET) 

:M  (2  /  Iwritingl  :P0S  M)) 

:BE  (4  /  I  was  I  ) 

: BY-SUB J  (6  /  |by| 

:P0S  PREP 

:PC0MP-M  (8  /  I  office  I 
:P0S  H 

:DET  (7  /  ithej  :P0S  DET) 

:M0D-P0ST  (9  /  | of | 

:P0S  PREP 

:PC0MP-N  (12  /  I  commissioner I 
:P0S  M 

:DET  (10  /  ithej  :P0S  DET) 

:M0D  (11  /  lofficiall  :P0S  M) ) ) ) ) ) 

(the  writing  contest  was  organized  by  the  ofhce  of  the  ofhcial  commissioner  .) 
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(2  /  lisi 

:POS  VBE 

:HHA  (1  /  Ihowl  :P0S  A) 

:PRED  (6  /  I  system  I 
:P0S  W 

:DET  (3  /  Ithel  :P0S  DET) 

:M0D  (4  /  I  Canadian  I  :P0S  A) 

:M0D  (5  /  llegall  :P0S  A) 

:VREL  (7  /  | constituted!  :P0S  V_H_H))) 


how  is  the  legal  Canadian  system  constituted  ? 
(how  is  the  Canadian  legal  system  constituted  ?) 

(5  /  I  courses | 

:P0S  M 

:DET  (1  /  Ithel  :P0S  DET) 

:M0D  (2  /  Ifollowingl  :P0S  A) 

:M0D  (3  /  Igenerall  :P0S  A) 

:ra  (4  /  leducationi  :P0S  W)) 

the  general  following  education  courses 
(the  following  general  education  courses) 

(3  /  I  mind  I 
:P0S  W 

:  LEX-MOD  (1  /  | peace |  :P0S  *) 

:  LEX-MOD  (2  /  | of |  :P0S  *) 

: MOD-POST  (4  /  | of | 

:P0S  PREP 

:PC0MP-W  (7  /  I  operation! 

:P0S  N 

:M0D  (5  /  I  continuous  I  :P0S  A) 

:ra  (6  /  IsystemI  :P0S  H) ) ) ) 

of  peace  mind  of  continuous  system  operation 
(peace  of  mind  of  continuous  system  operation) 
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Chapter  4 


oxyGen  Reference 


4.1  oxyGen  Package 

4.1.1  oxyGen  Installation 

The  oxyGen  package  contains  the  following  files; 

oxycompile . lisp 
oxyrun.lisp 
oxylin.lisp 
oxydebug. lisp 
oxyload. lisp 

The  code  files  for  the  different  oxyGen  files,  oxyload.  lisp  loads  the  files 
up. 

make-oxygen-core . sh 

A  shell  command  for  creating  a  dump  of  the  oxygen  system.  The  created 
dump  file  is  called  oxygen,  core 

oxycompile 

A  shell  command  for  compiling  oxyL  files  from  the  prompt,  oxycompile 
needs  oxygen. core  to  run  properly. 

Usage:  oxycompile  <oxyl-f ilename>  <out> 

The  result  of  running  oxyCompile  is  the  creation  of  a  <out>.  core  file  and 
a  shell  command  with  the  name  <out>.  The  usage  of  the  created  shell 
command  is; 

<out>  <AMR-f ilename>  <out-f ilename>  <mode> 

where  the  optional  argument  <mode>  is  a  keyword  for  the  word  lattice  to 
surface  module;  oxylin  or  nitrolin.  The  default  is  oxylin. 
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oxypamr 

A  shell  command  for  printing  pretty  AMRs.  oxypamr  needs  oxygen. core 
to  run  properly. 

Usage:  oxypamr  <amr-f ilename>  <pretty-amr-f ilename> 
nitrolin. lisp 

Provides  an  interface  between  oxyGen  and  Nitrogen.  This  file  needs  to 
be  included  in  an  oxyL  grammar  if  it  is  to  be  used.  Activating  nitroLin 
can  be  done  by  setting  the  <mode>  argument  to  \verbnitrolin —  in  the 
appropriate  functions. 

4.1.2  oxyCompile 

oxyCompile  provides  the  functions  necessary  for  compiling  an  oxyL  grammar 
into  a  Lisp  file.  oxyCompile  can  be  accessed  directly  from  the  shell  using  the 
shell  command  oxycompile  described  earlier. 

(oxycompile  <oxyl-grammar>  <output-f ile> 

Compiles  <oxyl-grammar>into  a  Lisp  program  and  outputs  it  to  <output-f  ile>. 
The  optional  <output-f  ile>  defaults  to  "oxyout .  lisp". 

(oxycompile-f ile  <oxyl-file>  <output-f ile> 

Compiles  the  oxyL  grammar  in  <oxyl-file>  into  a  Lisp  program  and 
outputs  it  to  <output-f  ile>.  The  optional  <output-f  ile>  defaults  to 
"oxyout. lisp". 

4.1.3  oxyRun 

oxyRun  provides  functions  necessary  for  proper  operation  of  a  compiled  oxyL 
grammar. 

(oxygen  <AMR>) 

Runs  the  oxyGen  linearization  grammar  on  an  <AMR>  and  returns  a  word 
lattice. 

(oxygen-file  <AMR-file>  <out-file>  <mode>) 

Runs  the  <AMR>s  in  <AMR-f  ile>  through  the  loaded  oxyGen  linearizer  fol¬ 
lowed  by  the  word  lattice  to  surface  module  specified  by  the  optional  ar¬ 
gument  <mode>  [oxylin  or  nitrolin).  The  output  sentences  are  printed  to 

<out-f ile>. 

(&amrType  <AMR>) 

Returns  the  type  of  an  AMR;  word,  wordlist,  basicAmr,  orAmr,  andAmr, 
unknown 
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4.1.4  oxyLin 

oxyLin  provides  functions  for  realizing  a  word  lattice  into  strings.  It  is  an  al¬ 
ternative  to  using  Nitrogen’s  Statistical  Extraction  module  which  realizes  word 
lattices  and  assigns  them  uni/bigram  scores. 

(oxylin  <word-lattice>  <stream>) 

Realizes  <word-lattice>  into  strings  and  prints  them  to  a  file  <stream>. 
<stream>  is  optional  and  is  standard  output  by  default. 

(check-size  <word-lattice>) 

Returns  the  number  of  independent  sequences  in  <word-lattice>  without 
realizing  it. 

4.1.5  oxyDebug 

oxyDebug  provides  functions  for  debugging  a  compiled  oxyL  grammar.  It  pro¬ 
vides  an  output  best  comparable  to  Lisp’s  trace.  Besides  helping  to  figure  out 
specific  problems,  the  output  of  oxyDebug  can  be  used  to  compare  different 
grammars  in  terms  of  efhciency  by  comparing  the  number  of  calls  they  make 
to  different  functions.  To  use  oxyDebug,  an  oxyL  grammar  should  have  the 
declaration  : Debug  &true.  This  forces  oxyCompile  to  add  calls  to  oxyDebug  in 
the  compiled  grammar.  Deactivating  the  debugging  can  be  done  by  assigning 
the  global  variable  *oxydebug*  to  nil. 

(oxydb-open  <file>) 

Opens  the  file  <f  ile>  and  links  it  to  the  reserved  output  stream  *oxydb-stream*. 
<file>  is  an  optional  argument  that  defaults  to  "oxydb.out". 

(oxydb-close) 

Closes  the  reserved  output  stream  *oxydb-stream*. 

(&oxydb  <format>  <var>) 

Allows  users  to  send  messages  to  *oxydb-stream*  from  inside  an  oxyL 
grammar.  <format>  is  a  string  that  can  include  Lisp’s  format  instruc¬ 
tions.  <var>  is  an  optional  variable. 

(oxy-pamr  <AMR>  <stream>) 

Pretty  prints  <AMR>  to  the  optional  output  stream  <stream>.  <stream> 
defaults  to  standard  output. 

(oxy-pamr-f ile  <in-file>  <out-file>) 

Reads  AMRs  from  <in-file>  and  pretty  prints  them  to  <out-file>. 
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4.2  Declarations 

:C0MMEIT  <string> 

Includes  the  comment  <string>  in  the  compiled  file.  This  declaration  pro¬ 
duces  no  action.  A  Lisp  comment  ” can  also  be  used  in  oxyL  files. 
Example 

: COMMEIT  "This  is  a  comment" 

:LAIGUAGE  <string> 

Specifies  the  name  of  the  generated  grammar.  This  declaration  currently 
acts  like  :  COMMEIT. 

Example 

:LAIGUAGE  "English" 

: GLOBAL  <variable>  <value> 

Declares  a  global  variable  <variable>  and  sets  its  value  to  <value>. 
Example 

: Global  $mode  HTML 

: Global  $articles  ("a"  "an"  "the . ) 

: CLASS  :<class>  (<role>+) 

Declares  a  class  role  :<class>  to  represent  all  the  roles  in  (<role>+). 

A  variable  $<class>  is  created  automatically  for  :<class>.  The  refer¬ 
ential  function  ®<class>  returns  a  basicAMR  if  only  one  of  the  roles  in 
(<role>+)  exists;  otherwise  an  andAMR  of  all  existing  roles  is  returned. 

In  both  cases,  the  matching  role  is  remembered  in  the  returned  value  as  a 
value  to  the  reserved  role  :role. 

Example 

: CLASS  : THETA  (:AGEIT  : THEME  :SRC  : GOAL  : IISTRUMEIT) 

$THETA  ref Mrns  (: AGEIT  : THEME  :SRC  : GOAL  : IISTRUMEIT)  and  it  can  be 
used  in  recasts  such  as  —  $THETA)  or  (®this  <<  (:new  /  $THETA)) 

®THETA  of  (0  /  X  :AGEIT  ag  :x  x  :y  y) 
returns  (0  /  ag  :R0LE  :AGEIT) 

®THETA  of  (0  /  X  :AGEIT  ag  :THEME  th  :x  x  :y  y) 

returnsiO  :AID  (0  /  ag  :R0LE  :AGEIT)  :AID  (0  /  th  :R0LE  :THEME)) 

: MACRO  '<macro-name>  <macro-body> 

Declares  a  macro  '<macro-name>  with  the  value  <macro-body>.  A  macro 
acts  like  a  global  variable  except  that  it  is  substituted  by  its  value  at  com¬ 
pile  time  not  run  time.  The  use  of  macros  makes  the  grammar  description 
more  concise. 

Example 

: MACRO  'IP-acc  (:form  IP  :case  acc) 

(®this  ++  'IP-acc)  is  compiled  as  (®this  ++  (:form  IP  :case  acc)) 
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CODE  (<lisp-code>+) 

Adds  Lisp  code  to  the  oxyL  file.  :C0DE  can  be  used  to  declare  functions 
and  variables.  All  user-defined  functions  must  have  the  prefix  &  to  run 
correctly.  Similarly,  all  non-local  variables  must  have  the  prefix  $. 

Example 

: CODE  ((setf  $myvariable  ’(me  me  me)) 

(defun  &even  (x)  (evenp  x)) 

(defun  &odd  (x)  (oddp  x) ) 

(defun  &concat  (stringl  string2) 

(format  nil  ""a"a"stringl  string2)) 

IICLUDE  <file-name> 

Loads  the  Lisp  file  named  <file-name>.  All  user-defined  functions  and 
variables  must  have  the  appropriate  prefixes  run  correctly.  Example 

: IICLUDE  "wordnet-data.lisp" 

: IICLUDE  "brown-corpus-stats .f asl" 

MORPH  (<function>  (Sword  Omorphemes) 

Defines  the  morphology  handling  function  for  the  system  to  access.  <function> 
is  linked  by  oxyCompile  to  the  internal  morphology  handler  |  (oxymorph 
(@word  (@morphemes) — .  oxymorph  is  fired  by  the  morphology  recast  +-. 

:  MORPH  links  the  arguments  (Sword  and  (Smorphemes  to  the  input  arguments 
of  function. 

Example 

: MORPH  (ftenglish-morph  (Sword  (Smorphemes) 

RECAST  <recast-name>  <recast-body> 

Allows  the  user  to  define  a  function  <recast-name>  for  modifying  AMRs 
using  oxyL’s  built-in  recasts.  Recasts  are  well  explained  in  Chapter  2. 
Example 

(Recast  &move  ((Sthis  &&  (:a  ((Sa  ++  (:b  (Sb)))  —  (:b)) 
moves  ike  role  :b  and  Us  value  under  :b’s  sister  :  a; 

(&move  (0  /  X  :a  (1  /  a)  :b  (2  /  b))) 
returns  (0/  x  :a(l/  a  :b(2/b))) 

RULE  <rule-name>  <rule-body> 

Defines  a  rule  <rule-name>  as  <rule-body>.  The  definition  of  oxyL  rules 
is  well  explained  in  Chapter  2.  Rules  can  be  named  anything,  but  it  is 
preferred  that  they  have  the  prefix  '/,.  A  rule  can  be  activated  with  the 
special  operator  DO  which  takes  an  optional  AMR  as  input.  The  default 
input  is  otherwise  (Sthis. 

Example 

(Rule  '/.order  (->  ((Sc  (Sb  (Sa  (Sb  (Sc)) 

(DO  '/.order  (0/x  (aa  (bb  (cc)) 
yields  c  b  a  b  c 


21 


:MAIIRULE  (<rule>+) 

Defines  the  main  function  in  an  oxyL  grammar.  This  is  the  only  obligatory 
declaration.  The  use  of  :MAI1RULE  is  well  described  in  Chapter  2. 

:  DEBUG  <boolean> 

Controls  the  inclusion  of  necessary  code  for  debugging  an  oxyL  grammar. 


4.3  Built-in  Functions 

®<role-sequence> 

Referential  Function.  Returns  the  value  associated  with  the  role  at  the 
end  of  the  <role-sequence>  of  ®this.  <role-sequence>  is  constructed 
by  listing  the  roles  separated  by  periods  and  without  the  colon  prefix. 
Example 

®this  .  sub  j  ect .  number  returns  the  value  of  the  role  :  number  in  the  value 
of  the  role  :  subject  under  the  current  AMR. 

(&  <recast>) 

General  Recast  Function.  Returns  the  result  of  executing  <recast>.  This 
special  function  allows  accessing  oxyL  built-in  recasts  as  regular  functions. 
This  is  useful  for  recasting  an  AMR  before  passing  it  as  an  argument  to  a 
rule  or  a  function.  This  function  cannot  be  used  in  a  rule  result. 

Example 

(do  '/,1P  (&  (®Subject  ++  (:case  nom)))) 

(&ex  <token>  <AMR>) 

Returns  true  if  <token>  exists  in  <AMR>.  <token>  can  be  a  role  or  a  word. 
<AMR>  is  optional  and  it  defaults  to  ®this. 

(&nex  <token>  <AMR>) 

Returns  true  if  <token>  doesn’t  exist  in  <AMR>.  <token>  can  be  a  role  or 
a  word.  <AMR>  is  optional  and  it  defaults  to  ®this. 

(&eq  {<valuel>  <value2>}+) 

Returns  true  if  all  <valuel>-<value2>  pairs  are  equal. 

(&n.eq  {<valuel>  <value2>}+) 

Returns  true  if  all  <varluel>-<value2>  pairs  are  not  equal. 

(&in  <AMR>  (<token>+)) 

If  <AMR>  is  a  word,  ftin  returns  true  if  <AMR>  exists  in  (<token>+) .  If  <AMR> 
is  a  wordlist,  &in  returns  true  if  any  word  in  <AMR>  exists  in  (<token>+) .  If 
<AMR>  is  a  basicAMR,  &in  returns  true  if  <AMR> .  inst  exists  in  (  <token>+ ) . 
If  <AMR>  is  an  or  AMR  or  andAMR,  ain  returns  true  if  any  <AMR>.  inst  ex- 
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ists  in  (<token>+). 


&true 

always  returns  T. 

The  following  functions  are  implemented  using  their  Lisp  counterparts;  &and 
&eval  &if  ftnot  &null  Scot  &quote 

4.4  Built-in  Recasts 

(<AMR>  ++  ({<role>  <value>}+)) 

Add  Recast.  Returns  a  copy  of  <AMR>  with  added  <role>-<value>  pairs. 
Adding  the  reserved  role  :  inst  overwrites  <AMR>.  inst.  If  <AMR>  is  a  word 
or  a  wordlist,  a  basicAMR  of  the  form  (0  /  <AMR>  {<role>  <value>}+) 
is  returned. 

Examples 

((0  /  X  :a  a)  ++  (:b  b  :c  c))  returns  (0  /  x  :a  a  :b  b  :c  c) 

("X"  ++  (:d  d))  returns  (0  /  "x"  :d  d) 

((0  /  X  :a  a)  ++  (/  y  :d  d))  returns  (0  /  y  :a  a  :d  d) 

(<AMR>  —  (<role>+)) 

Delete  Recast.  Returns  a  copy  of  <AMR>  with  all  <role>-<value>  pairs 
removed.  Deleting  the  reserved  role  :  inst  causes  the  replacement  of  the 
<value>  of  :  inst  with  nil. 

Examples 

((0  /  X  :a  a  :b  bl  :b  b2)  —  (:b  :z))  returns  (0  /  x  :a  a) 

((0  /  X  :a  a  :b  b  :c  c)  —  (/  :c))  returns  (0  /  nil  :a  a  :b  b) 

(<AMR>  ScSc  ({<role>  <value>}+)) 

Replace  Recast.  Returns  a  copy  of  <AMR>  with  all  values  of  <role>  replaced 
with  <value>.  If  <role>  doesn’t  exist  in  <AMR>,  it  is  added.  If  <AMR>  is  a 
wordoi  a  wordiest^  a  basicAMR  of  the  form  (0  /  <AMR>  {<role>  <value>}+) 
is  returned. 

Examples 

((0  /  X  :a  a  :b  bl  :b  b2)  SeSe  (:b  b3  :z  z)) 
returnsfO  /  x  :aa  :bb3  :bb3) 

(<AMR>  <<  (<new-role>  /  (<role>+))) 

Simple  Recast.  Returns  a  copy  of  <AMR>  with  all  <role>s  renamed  as 
<new-role>. 

Examples 

((0  /  X  : a  a  :b  b)  <<  ( : c  /  (:a  :b)))  returns  (0  /  x  :c  a  :c  b) 
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(<AMR>  <?  (<new-role>  /  (<role>+))  <cond>) 

Conditional  Recast.  Returns  a  copy  of  <AMR>  with  all  <role>s  renamed  as 
<new-role>,  if  <cond>  is  true.  The  special  referential  role  Sthat  should 
be  used  in  <cond>  to  access  the  value  of  each  recastable  <role>  one  at  a 
time.  This  is  important  in  the  case  that  several  <role>s  share  the  same 
name. 

Examples 

((0  /  X  :aa  :bbl  :b  b2)  <?  (:c  /  (:a  :b))  (&eq  Sthat.inst  bl)) 
returns  (0  /  x  :a  a  :c  bl  :b  b2) 

Conditionally  recast  :a  and  :h  into  :c  if  the  inst  value  of  ’’that”  recastable 
role  (:a  or  :h)  equals  bl. 

(<AMR>  <!  ( (<new-role>+)  /  (<role>+))) 

Hierarchical  Recast.  Returns  a  copy  of  <AMR>  with  <role>s  hierarchically 
renamed  as  <new-role>s.  Hierarchical  renaming  means  that  the  first  ex¬ 
isting  <role>  is  renamed  to  the  first  <new-role>;  and  the  second  existing 
<role>  is  renamed  to  the  second  <new-role>;  and  so  on. 

Examples 

((0  /  X  :d  d  :g  g  :a  a)  <!  ((:m  :n)  /  (:a  :b  :c  :d  :e  :f  :g  :h))) 
returns  (0/  x  :nd  :gg  :ma) 

(<AMR>  <0  <role>) 

Order  Recast.  Returns  a  copy  of  <AMR>  with  its  <role>s  renamed  as 
<role>-i  where  i  enumerates  the  order  in  which  <role>  appears  in  <AMR>. 
Example 

((3  /  X  :a  (1  /  a)  :b  (2  /  bl)  :b  (4  /  b2))  <o  :b) 
returns  (3  /  x  :a  (1  /  a)  :b-l  (2  /  bl)  :b-2  (4  /  b2)) 

(<AMR>  <on  <role>) 

Label  Order  Recast.  Returns  a  copy  of  <AMR>  with  its  <role>s  renamed  as 
<role>-i  where  i  is  the  node  label  of  the  value  of  <role>. 

Example 

((3  /  X  :a  (1  /  a)  :b  (2  /  bl)  :b  (4  /  b2))  <o  :b) 
returns  (3/x:a(l/a)  :b-2  (2  /  bl)  :b-4  (4  /  b2)) 

(<AMR>  <oi  <role>) 

Relative  Order  Recast.  Returns  a  copy  of  <AMR>  with  its  <role>s  renamed 
as  <role>-i  or  <role>+i  where  i  is  the  absolute  difference  between  the 
node  label  number  of  the  value  of  <role>  and  the  node  label  number 
of  <AMR>.  +  is  used  for  positive  difference  and  -  for  negative  difference. 
Obviously,  this  recast  expects  that  the  node  labels  are  positive  integers. 
Example 

((3  /  X  :a  (1  /  a)  :b  (2  /  bl)  :b  (4  /  b2))  <oi  :b) 
returns  (3  /  x  :a  (1  /  a)  :b-l  (2  /  bl)  :b+l  (4  /  b2)) 
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(<AMR>  +-  <morpheme>) 

Morphology  Recast.  Returns  a  string  that  is  the  result  of  combining  <AMR> 
with  <morpheme>.  This  recast  fires  the  internal  morphology  handler  oxymorph 
which  is  linked  to  a  user-defined  morphology  function  through  the  oxyL 
declaration  :  MORPH.  The  form  of  the  <AMR>  (i.e.  word,  wordlist,  basicAMR, 
etc.)  and  the  form  of  <morph.em>  (i.e.  word,  list  of  words,  even  an  AMR) 
is  absolutely  up  to  the  user-defined  morphology  function. 

Example 

: MORPH  (&english-morph  Sword  Smorphemes) 

("walk"  +-  past) 
returns  "walked" 


4.5  Reserved  Tokens 

Since  the  oxyL  files  are  compiled  into  Lisp  by  a  program  written  in  Lisp  us¬ 
ing  supporting  Lisp  functions,  it  is  important  that  the  oxyGen  user  shouldn’t 
redefine  any  of  the  variables  and  functions  that  are  necessary  for  the  proper 
operation  of  the  system.  The  following  is  a  list  of  all  the  reserved  tokens  in  the 
oxyGen  system. 

4.5.1  Reserved  Variables 

*oxycompile-class*  *oxycompile-local*  *oxycompile-debug* 
*oxydb-stream*  *oxydebug*  $this  $that 

4.5.2  Reserved  Roles 

:IIST  :0R  :AISID  :X-R0LE  :R0LE  (:THIS  :THAT) 

4.5.3  Reserved  Functions 

oxyL  Functions 

®this  ®that  oxymain  oxymorph  ®inst  ®/ 

(®or  ®and  ®x-role  ®role) 

oxy  Compile 

oxycompile-f ile  load-oxyl-f ile  init-oxycompile  oxycompile 
print-compiled-grammar  remove-oxydebug  compile-grammar 
compile-grammar-1  compile-grammar-def-recast 
compile-grammar-def-rule  compile-grammar-main- rule 
compile-grammar-rule  compile-grammar-set  compile-grammar-conds 
compile-grammar-conds-eq  compile-grammar-conds-neq 
compile-grammar-results  compile-recast  separate-roles  amrp 
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compile-term  variable-term  local-term  reserved-func-term 
compiled-reserved-func-term  role-term  roleseq  roleseq-1 
roleseq-2  tokens 

oxyRun 

add-roles  normalize-inst  del-roles  del-roles-1  replace-roles 
sub-roles  sub-roles-1  sub-roles-cond  sub-roles-cond-1 
sub-roles-hierarchy  sub-roles-h-1  sub-order-role 
sub-order-role-node  sub-order-role-inode  exval  exval-1 
inval  inval-1  valof  valof-1  prepare  &amrType  subamr-roles 
permute  permute-1  multiply-X-roles  get-x-roles  del-x-roles 
multiply-subamr  oxygen  oxygen-file 

oxyLin 

oxylin  gls-to-surf ace  gls-to-surf ace-1  add-on  check-size 
f ormat-surf ace  format-surface-sentence 

oxyDebug 

oxydb-open  oxydb-close  oxydebug  &oxydb  oxy-pamr  oxy-pamrl 
oxy-pamr2  oxy-pamr-f ile 

4.5.4  Reserved  Strings 

"*start-sentence*"  "*end-sentence*"  "*empty*" 
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